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Abstract 



Cd Motivated by modern observational studies, we introduce a class of functional mod- 

'~~' els that expands nested and crossed designs. These models account for the natural 

^ inheritance of correlation structure from sampling design in studies where the funda- 

m 

00 mental sampling unit is a function or image. Inference is based on functional quadratics 

l> 

^O and their relationship with the underlying covariance structure of the latent processes. 

^' 

^^ A computationally fast and scalable estimation procedure is developed for ultra-high di- 

cn 

^~~i mensional data. Methods are illustrated in three examples: high-frequency accelerom- 

> 

•^ eter data for daily activity, pitch linguistic data for phonetic analysis, and EEG data 

*^ for studying electrical brain activity during sleep. 
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1 Introduction 

In many current studies, functional measurements have well defined stochastic structure 
induced either by the experimental design or by the scientific meaning of the data. For 
example, the Sleep Heart Health Study (SHHS) Quan et al. (1997); Crainiceanu et al. 
(2009); Di et al. (2009)1 collected electroencephalograms (EEG) data for thousands of 
subjects at two visits, roughly five years apart. At every visit, EEG data were recorded at 
125Hz during sleep. Thus, for each subject and visit, data consist of 125 observations per 
second, or 0.45 million per hour. Crainiceanu et al. (2009) applied a Fourier transformation 
to the original data and obtained the normalized (5-power as a densely-sampled stationary 
time series. These data have a natural hierarchical structure induced by the replicated visits 
within each subject. To be more precise, one can denote the (5-power function for visit j of 
subject i at time t after sleep onset by l^j(t). Vjj(t) can be decomposed into a subject-specific 
process Xi{t) and a visit-within-subject process Uij{t) that quantifies the deviation from the 
subject-specific mean. 
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Figure 1: Activity intensity measurements over 5 days for 4 subjects. Original data has 10 observations 
per second. This plot shows the average of activity intensity in non- overlapping 15 minute intervals for 
improved display clarity. 



A second example is provided by Bai et al. (2012) in a recent study of physical activity 



in an elderly population. In this study, each subject wears an accelerometer that records 
three- axis accelerations during in-home activities at a sampling frequency of lOHz. After nor- 
malization with respect to at-rest variability, Bai et al. (2012) introduced activity intensity, 
a measure of activity expressed in multiple of signal standard deviations during inactivity 
periods. Activity intensity is calculated in every tenth of one second interval. Figure 1 dis- 
plays the activity intensity for five subjects during a 5-day period averaged over 15 minutes 
for improved display clarity. One possibility of analyzing these data is to focus on activity 
intensity in non-overlapping one-hour intervals. Thus, the data for each subject contains five 
days, each with 36,000 activity measurements per hour for 24 hours. This can be viewed 
as a three-level hierarchical structure: hour within day within subject. More specifically, let 
Yijk{t) be the activity intensity at time t within hour k on day j for subject i. In addition to 
the subject-specific process Xi{t) and the day-within-subject process Uij{t), the remaining 
part of the variation in Yijk{t) can be explained by the hour-specific process Wijk{t) that 
quantifies the deviation of hour k from the average of day j for subject i. 

Aston et al. (2010) described a different study of phonetic analysis where the authors 
were interested in studying the fundamental frequency (FO, 'pitch') of spoken language. In 
particular, they recorded the FO-contours of syllables from 19 nouns pronounced by 8 native 
speakers of the Luobuzhai Qiang dialect in China. Suppose that we use Yij{t) to denote the 
pitch of syllables j at normalized vowel time t spoken by subject i. Each Yij(t) was measured 
at 11 equidistant time points across the duration of a vowel. The value of Yij{t) is jointly 
affected by at least two components: the syllable-generic effect Xj(t) relating to the word, 
tone, stress and intonation of the syllable, and the speaker-inherent effect Zj{t) that could 
depend on age, gender etc. Unlike the hierarchical framework in the previous examples, 
these two latent processes can be assumed to be mutually independent while they can have 
interactive effect on the pitch contours. Figure 2 displays an example of FO-contours for 
vowels that compose three different words spoken by three speakers. The color stratification 
indicates that some speakers have, on average, a lower pitch than others. The shape of the 



curves are strongly correlated with the vowels and words that they belong to. For instance, 
within vowel 'i', curves from word 3 all display a steep rising pattern and go down at the end 
of the vowel. But curves from word 2 (labeled by the triangle symbol) are all arch-shaped. 
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Figure 2: FO-contours for 3 words (triangles for word 2, dots for word 3, and multiplication signs for word 
4-) spoken by 3 speakers (pink for speaker 'a', black for speaker 'c', and gray for speaker 'g'). Each contour 
was measured at 11 equal distant time points within a vowel ('a'/a'/e'/i' or 'u') when a particular word 
was spoken by one of the eight speakers. Every word was repeated under three different contexts. 



Although these three studies have different designs, they share some common features: 
1) the fundamental observation unit is a function that can be very high-dimensional; 2) data 
has a known structure induced by the sampling design; and 3) analysis of individual levels 
of variability is of interest. One goal of this paper is to define a wide class of structured 
functional models with explicit functional effect components; in particular, this class will 
contain the observed structures in the three examples. We will focus on the common struc- 
tures and provide a consistent statistical framework for all these models. A second goal is to 



characterize the observed variabihty by uncorrelated latent processes, whose covariance esti- 
mators are obtained using method of moments estimators from the observed data. Through 
estimating and diagonahzation of these covariance operators, we will achieve both dimen- 
sionality reduction of the original data and statistical modeling on the induced linear spaces. 
From an intuitive perspective, this paper shows how to conduct statistically principled PCA 
when the data matrix has a particular known and common latent correlation structure. 

The structured functional models in this paper fall into functional linear mixed model 
(FLMM) framework. Earlier work [Guo (2002); Herrick & Morris (2006); Morris & Carroll 
(2006)] mainly used splines or wavelets smoothing in model fitting. Brumback & Rice (1998) 
and Guo (2004) have specifically studied functional nested and crossed designs. More recent 
work like Staicu et al. (2010) and Zhou et al. (2010) considered spatial correlation in the 
nested model. While all these models could be viewed as particular cases of FLMM, model 
fit and inference remains difficult and it is currently done on a model-by-model basis. We 
conclude that none of these previous papers have addressed the class of complex functional 
structures we discuss here. Moreover, we are the first to propose very fast approaches 
for high-dimensional data. We aim to introduce a data-driven approach that applies to 
both nested and crossed designs, but is generalizable to a much broader model space. We 
achieve this by introducing latent processes that capture explicit levels of variability using 
the same concept from standard mixed effects models. The only difference is that random 
effects are now replaced with random processes. Computational feasibility is achieved via 
principal component decomposition of latent processes covariance operators and by loss- 
less projections for ultra-high dimensional data. These approaches are methodologically 
related to the PCA decomposition. [Staniswalis & Lee (1998); Yao et al. (2003); Yao 
et al. (2005) ;Di et al. (2009) (MFPCA); Greven et al. (2010) (LFPCA) ; Aston et al. 
(2010)]. Aston et al. (2010) projects the whole function onto a vector space, where the 
vector entries are the first few principal scores of the function. Through several linear mixed 
effect models which link PC scores and the covariates, they are able to assess the effect of 
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covariates on the outcome function. Alternatively, MFPCA decomposes the intra-subject 
and inter-subject covariance operators in the two-way nested model, while inference is based 
on the scores separated by levels of variability. Using a similar approach, LFPCA models the 
longitudinal dynamics of functional observations measured at multiple visits. In this paper, 
we generalize these ideas to analyze functional observations collected under the most common 
nested and crossed designs, which expands the number and type of models for functional 
data. We propose structured functional principal component analysis (SFPCA) as a method 
to decompose the variability via PCA for any functional model with a particular linear 
structure. An attractive computational advantage of SFPCA is efficient handling of dense 
and high-frequency functional measurements. Throughout the paper, we only consider noise- 
free models because: 1) for low-dimensional data with white noise, the latent processes are 
estimable through smoothing; 2) for noise contaminated high-dimensional data, the problem 
is currently unresolved. To study the effect of this choice, we provide a simulation study in 
Section 4.2 for low-dimensional data with noise. 

We organize the paper as follows: in Section 2, we provide a list of structured func- 
tional models that SFPCA is applied to and connect them to the symmetric sum method of 
moment (MoM) estimators described in Koch (1968); Section 3 discusses SFPCA and its im- 
plementation, with its extension to high-dimensional settings; Section 4 describes simulation 
studies for low-dimensional, high-dimensional and noisy settings; Section 5 applies SFPCA 
to the scientific problems described in the Introduction. 

2 Structured Functional Models 

Koch (1967) provided a comprehensive list of linear models for scalar data that emerge from 
various experimental designs. We contend that these models have natural extensions to 
functional data and that the models may be analyzed by decomposing the corresponding 
covariance operators. Table 1 includes the proposed designs that are grouped based on 



sampling schemes. 





(Nl) One-way 


Y,{t) = fi{t) + X,{t) 


Nested 


(N2) Two-way 
(N3) Three-way 






(NM) Muhi-way 


y.,.,...,(t) = /.(t) + R^\t) + i?a(t) + • • • + R^X,^{t) 




(C2) Two-way 


Yj{t) = fi{t) + Xi{t) + Zj{t) + Wij{t) 


Crossed 


(C2s) Two-way with subsampling 


Y^.kit) = Kt) + Xiit) + Zj{t) + W^J{t) + Uijkit) 




(CM) Multi-way 


Y,i,...iAt) = Kt) + RrAt) + RT2it) + --- + RrAt) 



Table 1: Structured functional models. For nested models, i = 1,2,- ■ ■ , I; j = 1,2,- ■■, Ji;k = 
1,2,-- ■ ,Kij; ii ^ l,2,---,/i, 22 = 1,2,- • • ,/2ii,. . . ,ir = 1, 2, • • • , /riiia-.-v-i • Po^ crossed designs, 
i — 1,2, ■ ■ ■ , I; j — 1,2, ■ ■ ■ , J;k — 1,2, ■ ■ ■ , Uij; (CM) contains combinations of any s (s — 1, 2, • • • , r) subset 
of the r latent processes, as well as repeated measurements within each cell. 7i , 72 , • ' ' ,Td ^ {iki *fc2 ' ' ' ^fe, u : 
ki, fc2, • • • ,ks (z (1, 2, • • • , r), u G (0, 1, 2, • • • , Ii-^^i^...i^), s < r}, u is the index for repeated observation in cell 



As shown in Table 1, the observed outcome, Y{t), is expressed as a linear combination of 
latent processes X{t), U(t), Z(t), and/or W(t). We assume that these processes have mean 
zero and are square integrable, which guarantees their identifiability and mirrors standard 
statistical assumptions for scalar outcomes. Consequently, the total variability of a functional 
outcome is decomposed into process-specific variations. The covariance operators of latent 
processes define functional variance components of each model. Hence, these models capture 
a wide variety of correlation structures in modern functional data studies. We build up the 
intuition behind the functional nested and crossed designs, and connect them to the data 
examples that are discussed in the introduction. 



2.1 Nested designs 

A one-way nested model (Nl) is the simplest variance component model for functional data. 
In (Nl), the observed functional outcome Yi{t) is represented as a sum of a deterministic 
mean function, /i(t), and a level-specific stochastic process Xj(t). Xj(t) are assumed to be 
i.i.d, with mean zero and covariance operator Kx(t, s) = E{Xj(t)Xj(s)}; Kx may be thought 
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of as the functional counterpart of scalar covariance. The variability of Yi{t) is completely 
determined by that of Xi{t), that is, Ky = Kx- In conventional functional data analysis 
(Ramsay & Silverman, 2005), Xj(t) would be expressed via a set of spline or wavelet basis, 
or data-driven principal components [Ramsay & Silverman (2005); Di et al. (2009); Greven 
et al. (2010)] . Irrespective of the basis functions, Kx is determined by the first two moments 
of the representation coefficients and a quadratic form of the basis functions. 

The two-way functional nested design (N2) is the functional equivalent of a one-way anal- 
ysis of variance (ANOVA) model. Originally motivated by the two-way sampling design of 
EEG data in SHHS (Di et al. , 2009), the model expands (Nl) with a subject- visit specific 
process Uij{t) that has covariance Ku{t,s) = Cov{Uij{t),Uij{s)}. Thus, the observed total 
variability of Yij{t) is decomposed into subject-specific and subject- visit specific variability. 
These two parts are modeled through the corresponding functional covariance components 
Kx and Ku - covariance operators of Xi(t) and Uij{t). To ensure identifiability, the ran- 
dom processes Xj(t) and Uij{t) are assumed to have mean zero and be un correlated. This 
assumption also guarantees that Ky = Kx + Ku. 

Additional levels of nesting can be included in the model to accommodate corresponding 
hierarchies. For example, the three-way nested model (N3) provides an appropriate frame- 
work for modeling the activity intensity data described in Section 1. In addition to the 
subject-specific process Xi(t) and the subject-visit specific process, Uij(t), the remaining 
part of the variation in Yijk{t) is modeled through Wijk{t), which quantifies the hourly de- 
viation from the average activity intensity level of day j for subject i. The most general 
functional nested model (NM) admits arbitrarily many levels of nesting. If the activity in- 
tensity is followed for weeks or even months, a four-way or five-way model may be more 
appropriate, given the possibly repeated patterns of activity from week to week or month to 
month. As in the preceding models, mutual non-correlation is imposed for model identifiabil- 
ity. The total variability is decomposable into level-specific functional variance components 
as Ky = Ki + K2 + --- + Kr, where Kr{t, s) = Cov{i?5'l.^(t), i?5[.^^(s)}. Here we used the 



notation from Table 1 for the multilevel model with an arbitrary number of levels (NM). 

2.2 Crossed designs 

Another group of designs admits crossing between levels. For example, the two-way crossed 
design (C2) is a functional analog of two-way ANOVA with an interaction term. It empha- 
sizes a joint effect of two uncorrelated processes Xj(t) and Zj{t), as well as their interaction 
Wij{t), on the outcome Yij{t). The two-way crossed model with sub-sampling (C2s) applies 
to experimental designs where repeated measurements occur within each combination {i,j) 
induced by the two first-level processes Xj(t) and Zj{t). In addition to the first-level crossing 
Wij{t) as in (C2), Uijk{t) models the replications within each cell. For the phonetic analysis 
example, Xj(t) and Zj(t) can model the main effects of syllable i and speaker j, respectively, 
while Wij{t) models their interaction. Since multiple FO-contours may fall in category {i,j), 
we use Uijkit) to capture the remaining effect. 

In general, we can consider an m-way crossed functional model (CM) with arbitrary 
number of crossings. In this model, q {q > 2) uncorrelated latent processes have exchangeable 
first-level effects on Y{t). Any subset of s (s < q) processes out of q may have interactions, 
resulting in d functional additive terms in the model. For notational convenience, we express 
this model using d sub-index sets, {7i, 72, . . . , 7^} that define the model structure. For 
example, (C2s) with four terms can be written as {7i, 72, T3, 72} = {i, j, ij, ijk} and Rji (t) = 
Xi(t), -R7^(t) = Zj{t), RTi{t) = Wijit), and RrAt) = Uijkit). The assumptions on correlation 
structure stay the same; all latent processes, including interaction terms, are uncorrelated. 
We now show how to efficiently estimate these models. 

3 Structured Functional PCA 

We develop structured functional PCA (SFPCA) to efficiently reduce dimensionality and 
extract signals for the class of functional models introduced in Section 2. This approach 
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models latent processes parsimonously via principal components by Karhunen-Loeve ex- 
pansion [Karhunen (1947); Loeve (1978)]. SFPCA starts with estimating the covariance 
operators of latent processes. Following Koch (1968), we employ the MoM approach based 
on symmetric sums. By extending his approach to functional settings, we construct unbiased 
estimators of covariance matrices on a grid of p points T = {ti,t2, ■ ■ ■ , tp}. After estimating 
the covariance operators, we conduct spectral decomposition to obtain principal components 
and principal scores that serve as coordinates in the space spanned by principal components. 
We use two-way crossed design (C2) as the main example. Details for other models in Ta- 
ble 1 can be found in the Appendix. Let n{t) be the fixed population mean, Xi{t), Zj{t) and 
Wij{t) be mutually uncorrelated mean- zero random processes as described in Section 2. Their 
covariance operators are Kx, Kz and Kw, respectively, where Kx(t,s) = E{Xj(t)Xj(s)}, 
Kzit.s) = E{Zj{t)Zj{s)} and Kwit,s) = E{Wij{t)Wij{s)}. Using the Karhunen-Loeve 
expansion for Xi{t), Zj{t), and Wij{t), model (C2) becomes 

oo oo oo 

fc=l 1=1 m=l 

where (f)^{t), (f)f{t) and (/)m{t) are the eigenfunctions of the covariance operators Kx, Kz 
and Kw, respectively. The scores S,^ = j Xi{s)(j)^{s)ds, ^fi = J Zj{s)(j)f{s)ds, and ^^^ = 
J Wij{s)(f)^{s)ds are mutually independent random variables with mean and variance A^, 
Af , and Aj^, respectively, where A^ > A^^, Af > A^^, and \^ > X^+i ^oi every k, I, and 
m. Normality of scores is not necessary for the results in this paper, but may be a convenient 
mild assumption. 



3.1 Principal scores estimation 

Consider the case when most variability of each latent process is captured by the first A^i, 
A^2 and A^3 principal components of Xj(t), Zj(t) and Wij{t), respectively, model (1) can 

Ni N2 N3 

then be approximated as F,,(t) = /i(t) + E 0f (^)Cf + E <PfitKfi + E C(^)CTm- Let Y = 

k=l 1=1 m=l 
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(Yii, . . . , Yijj, ■ ■ ■ , Y/i, . . • , Y/jJ be a pxn matrix with Yij := {Yij{ti),Yij(t2), ■ ■ ■ ,Yij{tp)}'^ 
and n = X]i=i "^j- For notational simplicity we assume a balanced design where Jj = J, 
though the assumption is not necessary. Let ^f = {^^, ■ ■ ■ , Cj^J"^ and #x = [0i^('7~), 4>2 0') 
, ■ ■ ■ , 0jy (T)] be the first A^^i principal components observed at time grid T. Similar defini- 
tion applies to (^^, ^z) and (^|J, $vk)- Hence the truncated model is further expressed into 
matrix form as Yij = fi®!^ + $x^f + ^z^f + ^w^^ ■ 

We will show in the next section how to obtain Kx, Kz and Kw- Assume now that such 
estimators are available and let ^x i ^z ^^^ ^w ^^ their first A^^i, A'^2 and N^ eigenvectors, 
respectively. Denote the vectors of first A^^i, N2 and N3 eigenvalues for the covariance matrices 
as \x , A^ and A^y . We can estimate the truncated set of principal scores to be the best 
linear unbiased predictor (BLUP) of the mixed effect model Yij = fi^l^+^x ^f" +^z ^f + 
^w^^^ where ^f ~ A^(0, aJ), ^f ~ iV(0, A^') and ^^ ~ A^(0, A^^Jf); fi may be obtained 
by sample average. Specifically, for the two-way crossed model (C2) and three-way nested 
model (N3), their estimated BLUPs are provided in Appendix A. 



3.2 MoM covariance operator estimation 

By extending the idea of symmetric sum MoM estimators in Koch (1968), we show that 
our estimated covariance matrices will be of the form Kx = YG^Y-^, Kz = YH^Y"^ and 
Kw = YGvi/Y^, where Gx, Gz and Gw are specific matrices of dimension nxn. In fact, for 
all the structured functional models, MoM estimators of covariance operator is representable 
in the "sandwich" form, YGY-^. We illustrate the detailed calculation for the covariance 
operators for the two-way crossed design (C2) and three-way nested design (N3). Results 
for other design schemes are provided in Appendix B. We start by subtracting an estimator 
of the mean fi{t) and hence assume fi{t) = from now on. 

3.2.1 Two-way crossed design (C2) 

For model (C2), we have 
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E{Y,j{t) - Yklm{iy^jis) - Ykiis)} 



2{Kw{t,s)+Kz{t,s)}, [U = k,j^l 

2{Kw{t,s)+Kx{t,s)}, i{i^k,j = l 

2{Kw{t, s) + Kz{t, s) + Kx{t, s)}, ifi^kj^l 



Let fiij = 1 if Yij is observed and otherwise; n^o = ^ifiij, uqj = '^jUij, n = ^^jUij, 
^1 = Ei^fo and k2 = Ej^oi- Define D„xn = diagjNi, N2, ■ ■ ■ ,Ni} with N^ = njoI„,o) 
E/xn = diagjlj^^,, ■ ■ ■ , l^jj and 1„ = (1, 1, ■ ■ ■ , 1)'^. P„xn = diagjPi, ■ ■ ■ , Pi} with P, = 
diagjnoi, ■ ■ ■ , nonio} of dimension riiQ x riio, Fjxn = (fi, ■ ' ' 5 fj)^ is the second-level analogy 
to E, where fj is a vector with value 1 on observations with second-level process Zj{t) and 
otherwise. If Hz = 2{Kw + Kz), Hx = 2{Kw + Kx), and Hxz = 2{Kw + Kz + Kx) then 
using the results above we obtain the following explicit MoM estimators 

Hz = ^^EE(Y..-Y.O(Y.,-Y,,f 



ki — n 



i=i j^i 



Hx 
Hxz 



_ n 2^ 2^^ '■^ ^'sil^^i '^kj, 



ko-n 



i^k j 
1 



ki — n 

2 

ko-n 



Y(D - E^E)Y^ 



-YfP-F^F)Y^ 



n"^ — ki — k2 + n 



7 , Z_^C^ij ~ YfcOlYjj - Yfc/) 



i^fc jy/ 



-Yfnl - 11' 



D + E^E - P + F^F)Y^ 



n"^ — ki — k2 + n 
Thus, the covariance operators can be estimated as Kz = {Hxz — Hx)/2 = YG^Y 



-X 



(Hxz - Hz)/2 = YGxY^ and Kw = (Hx + Hz - Hxz)/ 2 = YG^^Y, where 



G; 



1 



-{nl-ll^-D + E^E- 



F^F-^(D-E^E)}andGH^ 



n?—k 
k2-n 



i(P_F^F)},Gx 



•n? — ki—k2+n 



{nI-11^ 



1 



n^ — fci — fc2 +n L k2—n 



P+F^F)4 



n^-fc2 , 



ki—n 



D-E^E)-nI+ll^}. 



3.2.2 Three-way nested model 

Consider now model (N3), where Yijkif) = Xj(t) + Uij(t) + Wijk(t),i = 1,2, ■■■ ,/;j = 
1, 2, ■ ■ ■ , Jj; /c = 1, 2, ■ ■ ■ ,nij, and X, U and W to be the three latent processes nested within 
each other. Similar to the approach for (C2), we have 
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E{Y,jkit) - Yiu,it)}{Y,jkis) - Yiuvis)}' 



2K]v{t,s), ii i = l,j = u,k ^ V 

2{Kw{t,s)+Ku{t,s)}, [ii = l,j^u 

2{Kx{t, s) + Kw{t, s) + Ku{t, s)}, iii^l 



Again let Yijk ■= {Yijk{ti), ■ ■ ■ , Yijk{tp)Y , n := Y!i,j nij, rn. = Y.f «ij, h = ^f^^ Ylf=i ^Ij^ 
^2 = Ei^f-- Di = diagjNii,--- ,Nimi} and D2 = diagjNi, ■ ■ ■ ,Ni}, where Nij = Uijlmj 
and Ni = m.In,,; Ei = diag{l^^^, ■ ■ ■ ,1^:^^}, E2 = diag{l^^ , ■ ■ ■ , l^J. If Hw = 2Kw, 
Hjj = 2{K\Y + Kjj), and Hj. = 2{Ky/ + Kz + Kx) then using the results above we obtain 
the following MoM estimators 



Hw 
Hu 



/ J / \^ijk ^ijv)\^ijk ^ijv) 



k\ — n 



i,j k,v 



k\ — n 



-Y(Di - Ef Ei)Y 



7 / J / J / j \ ijk ^iuv)[j^ijk ^n 



i jy^U k,v 

Hx = o /, / ^ / _, {Yijk — Yiuv)iYijk — Yu 



k2 - ki 
2 



Y(D2 - E^ E2 - Di + E{ Ei)Y 



n^ — h 



2 



i^l j,u,k,v 



n^ — fe 



Y(nl-ll^ -D2 + E^E2)Y^ 



Then kw = Hw/2, Ku = {Hu - Hw)/2 and Kx = {Hx - Hu)/2 all have the form YGY^. 
In general, multi-way nested and crossed designs can be estimated through a similar work 
flow (see Appendix B for details). 

3.3 Structured high- dimensional data 

Given the current research emphasis on high-dimensional data, linear models are still con- 
sidered to be difficult to fit for high-dimensional data. Here we show that the entire class 
of models described in Table 1 can be fit using fast approaches. Note that the estimation 
procedures in the previous sections assume that the MoM estimators of the relevant covari- 
ance operators can be constructed and decomposed. When the dimension of observations, 
p, is moderate, the methods described in Section 3 are straightforward. However, if the 
observations are high or ultra-high dimensional, the approach is no longer feasible. Because 
calculating and storing a p-dimensional covariance operator Kp^p is computationally expen- 
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sive, while its spectral decomposition becomes prohibitive. Thus, we propose an alternative 
approach based on a rank-preserving argument. This algorithm allows efficient calculation 
of the eigenvectors and eigenvalues without requiring either storing or diagonalizing the 
estimated high-dimensional covariance operators. The algorithm is outlined below. 

Throughout this section, we assume that p ^ n. Hence, the induced covariance ma- 
trix is at most of rank n. Zipunnikov et al. (2011) proposed an approach that avoids 
calculating the covariance operators in the original high-dimensional space. Consider (C2) 
as an example: the idea is to map the model onto a lower-dimensional space and obtain 
Yij := CYij = C$^^f + C#^^f + C^^^^. The matrix C can be anything as long as 
the left (outer) dimension is small and the right (inner) dimension is equal to the dimension 
of the problem. One possible choice of C would be to start with the whole data matrix, Y, 
which can be obtained by column binding individual data vectors, Yij. If Y = VS^/^U"^ is 
the singular value decomposition of Y, then we can simply choose C = V"^. The model be- 
comes Si/2U^ = V^$^^f + V^$%^ + V^$^^U := A^^f + A%^ + A^^|J. Theorem 1 
in Zipunnikov et al. (2011) shows that this transformation preserves full information for the 
linear PCA model. The eigenfunctions for the original model can be recovered by left multi- 
plying V to those obtained in the new model and the eigenvalues remain unchanged. This is 
straightforward to implement, as the number of operations involved in calculating the SVD 
of Y is linear in the dimension of the data, p. After obtaining the SVD of Y, each column 
Yij can be represented as Yij = VS^/^Ujj, where Ujj is a corresponding column of matrix 
U^. Therefore, the vectors Y^- differ only via the factors Ujj of length n, which is much 
lower-dimensional. Comparing this SVD representation of Yjj with the original model (C2), 
it follows that the structured separation of the variability modeled by high-dimensional latent 
processes Xi, Zj, and Wij is identically in the structured separation of the low-dimensional 
vectors Uij. This is the key observation which motivates our approach. This model has an 
"intrinsic" dimensionality that is induced by the sample size n. This low-dimensional model 
is estimable using SFPCA in Section 3 and requires only 0{n^) calculations. 



14 



After obtaining A;^^, A^^ and A]^^ as the estimates for A'''-, A^ and A'^, we can obtain 
^j , ^ • and ^j- as induced BLUP in the lower-dimensional model, with the matrices A's 
replaced by their corresponding estimates. Furthermore, $7v^, $jv2 ^^^ ^Ns ^^ ^^e original 
space may be recovered by left multiplying V onto A;^^, A^^ and A^^. We provide the 
formula for final estimates and their detailed derivation of two-way crossed model (C2) and 
three-way nested model (N3) in Appendix A. Up until the last step, all the calculations can 
be conducted in 0{n^) complexity. Therefore, fitting the model in a reduced-dimensional 
space guarantees the high-dimensional principal components in a p-linear time. This means 
that complex statistical models for ultra-high dimensional data sets can be fitted quickly. 

4 Simulations 

To better understand how SFPCA performs in practice, we consider both low- and high- 
dimensional simulation scenarios. We conduct simulation studies for the two-way crossed 
design (C2) and three-way nested model (N3), and evaluate the estimation under various 
signal-to-noise ratios. 

4.1 Models without white noise 

For the two-way crossed design (C2), we generate data from the following model 

Nx Nz Nw 

k=l 1=1 h=l (2) 

e.. '- iV(0, A^), 0, '^ N{0, \f) and ^.,. '^ N{0, Af) 

where ^ikS, Qi^s and rjij^s are mutually uncorrelated. We choose Nx = Nz = Nw = 4 and 
let the true eigenvalues be ^k — '^k — ^fe^ — O.S'^"^ A; = 1, 2, 3, 4. True eigenfunctions are 
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(/if (i) = sin(27rt) (j}^ (t) = cos(27ri) (j)^ (t) = sin(47ri) 

</>f (i) = sin(67rt) (/)f (t) = cos(67ri) ^f (t) = sin(87ri) 

0f (t) = V3(2i - 1) (/)f (t) = \/5(6t2 - 6t + 1) 0f (t) - \/5(20t^ 



(/)f (t) =cos(47ri) 
</.f (i) = I/V2 
30t2 + 12t-l) </)f(i) = l 



These functions are measured on the grid T = 1/p, 2/p, ■ ■ ■ ,1. For the low-dimensional 
case, we choose p = 100. Let / = 200 and J = 20 be the number of categories for first-level 
processes X{t) and Z{t), respectively. Figure 3 shows an example of the simulated curves. 





0.0 0.2 0.4 0.5 0.8 1,0 



0.0 0.2 0.4 0.5 0.8 1.0 



Figure 3: An example of the simulated functional data following model (C2) as listed in equation (2), 
with parameters specified in the text. The three curves within each panel share the same X{t), but vary on 
Z{t) and the interaction term W{t); curves with the same color across two panels share the same Z{t), hut 
may vary on others. Similarities within each panel as well as on second-level effect is observable. 

We repeat simulation for 100 times and display the estimated principal components for the 
three latent processes in Figure 4. The procedure recovers X{t) and W{t) very well. Given 
the small number of samples that is observed for 2'(t), estimation for Kz and therefore its 
eigenfunctions are more noisy and unstable. This can be resolved by increasing J . In fact, we 
examine another two scenarios where / = 20, J = 200 (upper panel of Figure 9 in Appendix 
C) and I = J = 200 (bottom panel) to evaluate the effect of sample size on recovering each 
variance component. The reliability in estimating the eigenfunctions increases when there 
are more observations per latent process. 
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Figure 4: The estimated principal components when I — 200 and J — 20 in 100 simulations are shown in 
gray bands. Black curves are the true eigenf unctions. 

For the three-way nested model (N3), we study the high-dimensional scenario using a 
similar parameterization with example. Consider 



Ni N2 N3 

1=1 m=\ h=\ 

ia -' iV(0, Ap)), 0,>„ -' iV(0, A(^)) and r^.,,, ^^^^ iV(0, \f) 



(3) 



where Uij(tys (j = 1,2,--- , J) are nested within i, and H^ijfe(t)'s (A; = 1,2, ■■■ ,i^) are 

nested within ij. ^u's, Ojm's and rjij^h^ are assumed to be uncorrelated. 

Let Ni = N2 = N3 = 4, a1'^ = Af = A;, 
eigenfunctions 



.(3) 



0.5''-\k = 1,2,3,4., and the true 



4>f {t) = sin(27ri) (j)^ (i) = cos(27ri) (^3 (t) = sin(47ri) 

^f (i) = sin(67ri) <?!)^(i) = cos(67rt) c/)^ (t) = sin(87ri) 



0f (t) := cos(47rt) 
<P^{t) = I/V2 



rw = i 



(j>Y (t) ^ V3{2t - I) 0f (i) = y5(6t2-6i+l) 0f (t) = \/5(20i3-30i2 + 12f-l) 
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0.0 0.2 0.4 0.6 0.8 1.0 



0.0 0.3 0.4 0.6 0.8 1.0 



Figure 5: Simulated functions following the three-way nested model as listed in equation (3). The three 
curves in the upper left panel are random samples that share the same first- and second-level processes. 
Their correlations across curves are induced by both X{t) and U{t). On the upper-right panel are samples 
with the same first-level process X{t) only, but vary on the second- and third-level process U{t) and W{t). 
The correlation among these five curves is much weaker than the previous plot. The bottom-left plot are 
those share the same first- and third-level processes X{t) and W{t), but vary on the second-level process 
Z(t). Finally, the bottom-right plot shows three completely uncorrelated curves from the model. 




0.0 0.4 



Figure 6: The estimated principal components under high- dimensional setting over 100 simulations are 
shown in gray (we randomly plot 50 out of 100 estimates) . The true eigenf unctions are displayed in black 
curves. The first- and second-level hierarchies are captured by two sets of trigonometric basis. The third- 
level processes is polynomial. We deliberately use a thinner line width for the black curves on the third row, 
so that the estimated PCs in gray are still visible. 



With p = 35, 000, I = 50, J = 5 and i^ = 5, we display the simulated data under various 
nested structures in Figure 5 and the estimated principal components in Figure 6. Note 
that the estimated eigenfunctions are in fact the linear combinations of eigenfunctions from 
other levels. For instance, the estimates for constant functions 0f (t) and 05^ (t) demonstrate 
trigonometric shape. The deviations, although small on the scale of the functions, can 
diminish as sample size increases. 

4.2 Models with white noise 

So far have assumed that data are measured without noise, or that noise has a smooth 
covariance structure that can be absorbed into one of the latent processes. However, SFPCA 
can easily be extended to data contaminated by white noise e ~ A^(0, cr^). The only difference 
is that the new symmetric sum operators H have larger diagonal values, i.e., H{t,s) = 
H{t, s) + cr'^Sts, where 6ts is the Dirichlet function. We estimate H{t, s) by smoothing the 
off-diagonal surface of H as, for example, in Staniswalis & Lee (1998). For high-dimensional 
case, although the white noise remains i.i.d with the same variance after left multiplying 
with an orthonormal matrix as described in Section 3.3, we encounter multiple difficulties. 
Therefore, we do not consider high-dimensional case with noise for the time being. 

Specifically, for model (N3), we have K\Y(t,s) = Kw(t,s) + cr'^6ts, Kjjit.s) = Kij(t,s), 
Kx{t,s) = Kx{t,s). Thus, we only need to smooth the off-diagonal elements of Kw to 
estimate cr^. We show a simulation example of SFPCA for the N3 model under the same 
setting as in the previous section. The mean square errors (MSEs) of the estimated principal 
components and white noise variance are in Table 2. 

To better summarize the signal-to-noise ratio, we define reliability ratio (Shou et al. , 
2012) to be the ratio of variances across all observed time points between processes X{t), 
U{t) and W{t), and the observed outcome RR = tracclK^ +k +k +a'^\ ' Similar summary 
statistics can be defined for other models listed in Table 1. 
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Table 2: Average MSE of the first 4 principal components under different signal-to-noise ratio. 

5 Data Applications 

SFPCA can be applied to various types of structured data including the three examples 
discussed in the introduction. The SHHS data was analyzed in details in Di et al. (2009) 
with MFPCA, which is a special case of the methodology considered in this paper. Here we 
provide results for the phonetic study and the accelerometer data. 

5.1 Phonetic study 

The phonetic study of Luobuzhai Qiang dialect, as described in the Introduction, consists of 
FO-contours from 8 subjects speaking 19 words. Each word contains 3 to 12 different vowels. 
There are 5 different vowels: '9', 'a', 'e', 'i' and 'u'. For each vowel within a particular word, 
pitch is measured at 11 equidistant time points, with time standardized by the length of 
the vowel. This dataset is not high-dimensional, but is treated as a function because it was 
assessed to be smooth (Aston et al. , 2010). To assess the effect of covariates with relatively 
simple specification, Aston et al. (2010) assumes the same principal components for all latent 
processes. Covariates only influence the pitch curves through principal scores - weights of 



20 



each principal components. Here we relax these assumptions and attempt to fully evaluate 
the variability and effect of each latent process that is indicated by the data structure. 
Because each word was repeated under three different contexts by each speaker, we take a 
two-way crossed with sub-sampling (C2s) model as in Table 1; we absorb Wij{t) into Uijk{t) 
because the sample size is not large enough to estimate Wij{t) accurately. In fact, our model 
can be written as Yijk{t) = fi{t) + Xi(t) + Zj(t) + Uijk{t), with the first-level random effect 
Xi{t),i = 1,2,- ■■ ,8 accounting for variation among speakers. We choose the second-level 
random effect Zj{t),j = 1,2, ■■ ■ ,45 as the heterogeneity induced by the nature of vowels 
and words together. Based on our observation from Figure 2, vowels (x-axis) and words 
(line symbols) demonstrate their own generic pattern, which are not necessarily additive; 
Uijk{t), k = 1, 2, ■ ■ ■ ,nij contain all the remaining variation within a specific speaker/word 
cell, including noise or measurement error. By applying the SFPCA algorithm, we extract 
the principal components as shown in Figure 7. 
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Figure 7: Principal components for process Xi, Zj andUij using two-way crossed model with sub-sampling 
(C2s). The first row show the first 4 principal components for the speaker- specific effect Xi{t), while the 
second row display results for the word/vowel effect Zj{t). The proportion of variation explained by each 
principal components are listed in the plots. The estimated percentage of variation explained by each latent 
process is listed in front of the corresponding row. 
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For the speaker-specific process, most of the variation (about 99.8%) of Xj(t) is explained 
by the first PC which is essentially constant. Similarly, the first PC of the word/vowel-specific 
process Z{t) remains constant. This finding is consistent with the original one of Aston et al. 
(2010): most of the variation arises from the mean 'shift' of the data. However, instead 
of modeling the overall PC scores to determine that the 'shift' is speaker- and word- de- 
pendent, we can claim that $^(t) corresponds to speaker heterogeneity and $f (t) accounts 
for word/vowel difference. If we further care about the effects of speaker- or word-related 
covariates, we can fit regression models specifically to principal scores of each latent process. 
Also, the fact that the 2^^^^ - 4*"^ PCs of Zj{t) explain more variability than those of Xj(t) indi- 
cates more complex structures induced by inherent vowel and word variability. Furthermore, 
with SFPCA, the relative effect size of speaker and word/vowel can be measured through 
the portion of variation in the whole dataset that each process explains (42.8% vs. 34.4% 
in Figure 7). Naturally, we are able to quantify the size of noise and measurement error 
through Uij{t). Note that we can easily characterize the reliability of speakers across the 
words spoken as Rxz = trace(Kx + Kz)/trace(Kx + Kz + Ku) (Shou et al. , 2012), which 
for this data set was 77.2%. The very interesting analysis provided by Aston et al. (2010) 
for this data analysis could not have shown these patterns or reliability estimates, as this 
would require an explicit modeling of the functional space. Instead, they focused on principal 
components that were common across processes and analyzed the scores using mixed effects 
models for scalar observations. Thus, their principal components are linear combinations of 
those shown in Figure 7, which may create confusion in the interpretation stage. The two 
approaches are complementary and should be pondered in particular applications. 

5.2 Accelerometer data 

In the accelerometer study, each participant has their activity intensity values recorded for 
5 days during active periods (after waking up and before bedtime). Their active periods are 
identified using methods developed by Bai et al. (2012). Bai et al. (2012) mainly focuses 
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on predicting movement type based on the three-axis accelerometer records. Here we are 
more interested in using the same dataset to assess the variabihty of energy expenditure in 
the population and from day to day. As Figure 1 indicates a periodic pattern every hour, 
we model the observed curves into three hierarchies: hours within days within each subject. 
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Figure 8: Principal components for process X , U , and W using three way nested model (N3). The pro- 
portion of variation explained by each PC component is listed in the plots. The first row show the first 4 
PC components for the patient- specific effect X{t), the second row display results for the day-specific effect 
U(t) and the third row are estimated principal components for hour-specific effect W{t). The proportion of 
variation explained by each latent process is labeled on the left side. 

The three-way nested model (N3) is applied to decompose the variance of the data. For 
the original dataset which contains 36, 000 measurements per hour, we can implement SF- 
PCA using methods described in Section 3.3 for high-dimensional data. However, because 
data are densely sampled, it is more informative to smooth the data by averaging energy 
expenditure within every minute and conduct SFPCA on the summarized data. For sim- 
plicity, we also truncate the observations at the end of the study that does not complete 
an entire hour. Therefore, there are 60 measurements for every curve with a maximum of 
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19 curves per day for each subject. The first 4 principal components for the three levels 
of latent processes are displayed in Figure 8. The first component for the patient-specific 
process X{t) accounts for the heterogeneity in the population. While the remaining few 
demonstrate either one-peak or double-peak energy expenditure pattern within one hour. 
Compared to subject-specific and hour-specific effects, the day-to-day variation (8.3%) ac- 
counts for a much smaller portion of the total variability. The majority (about 76%) of the 
total heterogeneity is contained in the hour-to-hour variation. This tells in a quantitative 
way that people follow a similar routine everyday, but their energy expenditure change dra- 
matically within one day, depending on the type of activity they are involved in during a 
particular hour. The reliability ratio can also be evaluated as in the previous example. 

6 Discussion 

The defining characteristic of many functional studies is the existence of a specific structure 
in correlations vis-a-vis the experimental design, which can directly affect inference. Thus, 
there is an increasing demand for methods that 1) respect study design; 2) model multiple 
levels of variation; 3) are computationally feasible in high dimensions. In response to this 
demand, we have introduced a class of structured functional models that includes nested 
and crossed designs, and proposed a statistical framework, SFPCA, that estimates these 
models. Given the non-correlation assumption of latent processes, the covariance structure 
of the observed outcome is fully captured by the variance operators of the random processes. 
SFPCA is a set of efficient approaches that estimate and analyze the covariance structure 
using a uniform protocol for all the models. It uses functional PCA for dimensionality 
reduction and feature extraction. 

The extensive simulation studies clearly demonstrated a great potential of the methodol- 
ogy to recover level-specific features of the latent processes. When we applied SFPCA to two 
studies that collected accelerometeric and phonetic data, we were able to distinguish various 
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layers of effects inherent in the data. Similar to Section 5 in Koch (1967), our method can 
also be extended to the cases when the covariance matrix are heterogeneous and differ across 
levels. 

Future work should focus on developing more efficient unbiased method of moment es- 
timators that are adaptable to unbalanced designs. The development of combined method- 
ology that infuses both 'naked' (nested/crossed) design-induced structures, with covariate- 
driven parts such as the one proposed in (Greven et al. , 2010), is an important, although 
challenging step in expanding this framework. Our methodology has a few potential limita- 
tions. Two most important ones are more rigorous treatment of noise (Di et al. , 2009) as 
well as possible sparsity in the functional observations Di et al. (2011). 



Acknowledgement 

We thank Dr. John Aston for kindly providing us the phonetic study data and for his 
inspiring thoughts on the application of SFPCA. 



25 



A Finding principal scores for three-way nested and 
two-way crossed design 



Here, we provide details of calculating principal scores in (N3) and (C2) models. We assume 
noise-free scenario and follow Zipunnikov et al. (2011). We can write model (N3) as Yj = 
Bu,-, where B = iB,,|B,,|Bwl and u, = (^i^U.^U,^')'- B^ = 1„,. ® $y, B,, = I 



>x\^u\^w\ 



'X, -0(7 



$(7) and B 



w 



I. 



^w So the BLUP of Uj would be 



Ui 



B'B)^B'Yi 

/ 



-1 



n,;.I 



i-^Ni -I- J 



I'r fnio$V$ 



I 



ij^X^U) 



V 



r$V$ 



'1' 



X^W) 



U^W) 



'-rii.Ni 



( 



I 



^xYiln, 



\ 



vec{$uYi(ln, 8)Ij)} 
\^ vec{$wYi} j 



ni.lNi J- J 



1't riijCxu 

nijlN2) I J 



IL. 8)C 



rii 



XW 



^ '^ (A^^)'SV2U',1„,. ^ 



1' 



Uij 



C 



uw) 



\ 



I 



Ui.Ns 



vec{(A^^)'SV2u^(l„^^^I,)} 
vec{(A(^3)'si/2u'J 



where Cxc/ = (A;^^)'A^^ and Cxw, Cuw are similarly defined. 

For two-way crossed model (C2) with balanced design, Y = Bu, where Y = (Y'^^^, ■ ■ ■ , Y^j)'. 
B = [Bx|Bz|Bp^], where Bx = 1/ ® (Ij ® $x), B^ = 1/ ® (Ij ® $z) and B^/ = 
hj^^W, u = K|u^|u^)' andu^ = (^/', ■ ■ ■ ,^7'''), u^ = (^/', ■ " " ,0^') and u^ = 
(^ii^V--,^/j^T- The BLUP gives 



26 



u 



B'B)-iB'Y 



-1 



I/JW3 



/ 



V 



Jl/iVi (l/l'j)®Cxz (1/ 8) I'j) ® CxH^ 



IljN^ (1; Ij) 8) Czw 

^IJN:i 



-' / 



vec{$'zY(li®Ij)} 
vec($WY) y 

vec{(A^^)'SV2u'(Ii®lj)}' 

vec{(A^^)'SV2u'(li®Ij)} 

1^ vec{(A^«)'SV2u'} y 



Again, Cxz = ^'x^z = (A/) A^^^ same for Cxw and C^h^. 

B MoM estimators for additional models 

B.l Multi-way nested model (NM) 

The covariance operators of latent processes in multi-way nested model satisfy 
E[{Yi,^,...Ut) - Yh,h,-hAt)}{yH'.2-Us) - Yh,h,-hAs)}^] 
= 2Kr{t, s), if ii = hi, ■ ■ ■ , ir-l = h^-^i, V 7^ ^r 

= 2{Kr^l{t,s) + Kr{t,s)}, iHi = hi,--- ,ir-2 = hr-2,ir-l^ hr-l 

= 2{Ki{t,s) + --- + Kr{t,s)}, ifii^hi 
Therefore, we have Hj{j = 1, 2, • • • , r) operators as 
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where /cj = J2i-^i2-ir- ■ '"'M2--ir- '^ = 1)2, ■ ■ ■ ,r. The covariance operators are represented as 



K-r+i-j — {Hj+i — Hj)/2 



Y{— — -(D,+i - Ej_,,E,+i - D, + EjE,) - , _, (D, - EJE, - D,_i + EJ_,E,_i)}Y^ 



B.2 Two-way crossed design with sub-sampling (C2s) 

With model Yi,k{t) = /i(t) + X,(t) + Z,(t) + Wij{t) + U^jk{t), i = 1, 2, ■ ■ ■ , /; j = 1, 2, ■ ■ ■ , J 
and /c = 1, 2, ■ ■ ■ , njj, we have 

E{Y,jk{t) - YiUt)}{{Yijk{s) - Yiu.{s)}^ 
= 2Ku{t,s), iii = l,j = u,k^v 

= 2{Kz{t,s) + Kw{t,s) + Ku{t,s)}, ifi = lj^u 

= 2{Kx{t,s) + Kw{t,s) + Ku{t,s)}, ifi^lj = u 

= 2{Kx{t,s) + Kz{t,s) + Kw{t,s) + Ku{t,s)}, iii^kj^u 
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The corresponding H operators are 
Hu = ^^VV(y^,fc-r^i.)(y.,fc-r,,-.)^ = -^— Y(Di2-Ef2Ei2)Y'^ 

Hz = Y^^V ^ Y.Y.^^iik - ^^uvW^jk - Yiu.f = ^ ^^ Y(Di - Ef El - Di2 + Ef2Ei2)Y^ 

i=l jy^U k,v 



Hx = TrA-EEE(^^^-^-^'^--)(^^^-^-^'^- 



, , / ./ ./ A-ijft. -iju/y-ijn. -ijv) — ^ ; — Y(D2 — E2 E2 — Di2 + Ei2Ei2)Y 

K2 - K12 tif , k2-ki2 

i^l J k,v 



Ht - o_._.,.^^Y^ ^{^ijk - YiuvWijk - Yiu^f 



n^ -ki-k2 + ki2 . ,, . , , 

J7=j 3^u k,v 



T \^T 



n? -ki-k2 + ki2 



-Y(nl - 11^ - Di - D2 + D12 + Ef El + E2E2 - Ei2E{2)Y 



where ki = ^. nfo, /cs = Ej "-oj^ ^12 = Eij nfj 



B.3 Multi-way crossed design (CM) 

The most general model for crossed design is 

withifc = 1,2, ■■■ ,mfc where /c = 1, 2, ■ ■ ■ ,r, m = 1, 2, ■ ■ ■ ,ni^i^...i/, 1 < g < r, (ji,j2,--- Jq) ^ 
{1, 2, ■ ■ ■ , r}, ji < J2 < ■ ■ ■ < jq and RifJ^^.:'.'^ (t) has variance operator Kj^j,^...j^, Ri^n-iruif) 
has covariance operator Ku- With similar procedure as in the previous sections, we can work 
out the formula for the covariance operators. We omit the details here. 

B.4 Experiments of the Mixed Type 

Given Y.jkiit) = fi{t) + X,{t) + Z,(t) + W,j{t) + dkit) + D^iit) + U.^kit) + V,,i{t) + E.^kiit), 



29 



E{{Y,jki{t) - Yi,j,k'i'{t)){Y,jki{s) - Yej,k'i'{s)f} 

= 2{KD{t, s) + Kv{t, s) + KEit, s)}, if i = i',j = f, k = k',l^l' 

= 2{Kc{t, s) + Ku{t, s) + KE{t, s)}, if i = i', j = f, k^k',l = l' 

= 2{Kc{t, s) + KD{t, s) + Ku{t, s) + Kv{t, s) + Ksit, s)}, if i = i',j = j' , k^k',l^l' 

= 2{Kzit, s) + Kw{t, s) + Knit, s) + Ku{t, s) + Kv{t, s) + KE{t, s)}, if i = i' , j / /, k = k' 

= 2{Kz{t, s) + Kw{t, s) + Kc{t, s) + Koit, s) + Ku{t, s) + Kv{t, s) + KE{t, s)}, if i = i' , j / /, k ^ k' 

= 2{Kx{t, s) + Kw{t, s) + Kc{t, s) + Ku{t, s) + i^y (t, s) + i^s(t, s)}, if i / i', j = /, / = /' 

= 2{Kx{t, s) + Kw{t, s) + Kc{t, s) + i^i5(t, s) + Ku{t, s) + Ky (t, s) + ^^(i, s)}, if i / i',i = /, / / ^ 

= 2{i^x(t, s) + i^z(t, s) + Kw{t, s) + Kc{t, s) + Koit, s) + Ku{t, s) + Kv{t, s) + KE{t, s)}, if i / i',j / / 

So the H operators are expressed as 



Hl — ^ 2 V^ 2 / ^/ X^ijkl - Yijkl')(Yijkl - Yi 



jj,fc jjr'feO Z^i,j,l,k "-ijkl ij^f^ i^ii 



ijkl') 



Hk = ^ 2 Y^ 2 — z^ z^ (Yijki — Yijkii)(Yijki — Yijk'i) 

2-ii,j,l ''ijOl Z^i,j,k,l ijkl j j ; ky^k' 

He = ^ 2 — I 9Y^ 2 — ^v 2 — ^v 2 — ^ ^ {Yijki - Yijk'i') (Yijki -Yijk'i'] 

2^i,j''^ijOO ~^ "^ 2^i,j,l,k''^ijkl 2-yi,j,k''^ijkO l^i,j,l''^ijOl ij /-^j/^--^^/ 

Hj = ^ — "2 — ^^ "2— XI S X] (^^J^^ ~ Yij'ki'){Yijki - Yij'ki'f 

l^i,k "'iOkO ^i,j,k "'ijkO j j^j' i^i'^k 

^■JK = ^^Z?2 _Y- „2 _ V- „2 Ty^ ;;^X]Z] Y^ {Yijkl-Yij'k'i'){Yijki-Yi 



L^i '^iOOO Z^j,i "iiOO Z^i,fc '^iOfcO ^" Z^ 



ra^j. 



ij'k'l') 



,j 'HjOO Z-^i,k "-iOkO "^ ^i,j,k "'ijkO j jyj/ l^l'^ky^k' 



Hi = 2 — Tv ^^2— X] X] X] (^^i^^ ~ Yi'jk'iWijki - Yi'jk'if 

2^j,i'%oi 2^i,j,i 'Hjoi j i^i'i^k',k 

HiL = ^^—2 _^ ^2 _ V «2 Tv ;;2— XIZ] IZ (^iifc;-i1'jfc'«')(>'yM-li'ifc'F)^ 

Z^i ^^OiOO Z^jj- '^hjQQ 2^j,l %jOl + Z^jj- « %0« j^j/ J fc',fc,«7^/' 



-ff/J — 2 I ov^ 2 Zy^ — 2 Zy^ — 2 — Z^ Z^ {Yijki-Yiij'k'i'){Yijki-Yi'j 
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C Additional simulation results 
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Figure 9: The first 3 panels are estimated principal components when there are 20 samples in X{t) and 
200 in Z{t). The bottom 3 panels are results from the same model as in equation (2) when I = J = 200. 
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